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fault-tolerant  systems  and  study  the  steady-state  behavior  of  the  number  of 
jobs  in  the  system.  As  a  numerical  example,  we  consider  a  system  with  two 
processors  subject  to  failures  and  repairs. 


Abstract 


In  this  paper  we  analyse  a  fault-tolerant  computer  system.  The  failure/repair  behavior  of  the  sys¬ 
tem  is  modelled  by  an  irreducible  continuous-lime  Markov  chain.  Jobs  arrive  in  a  Poisson  fashion  to  the 
system  and  are  serviced  according  to  an  FCFS  discipline.  A  failure  may  cause  the  loss  of  the  work 
already  done  on  the  job  in  service,  if  any;  in  this  case  the  interrupted  job  is  repeated  as  soon  as  the  sys- 
lent  is  ready  to  deliver  service.  In  addition  to  the  delays  due  to  failures  and  repairs,  jobs  suffer  delays  due 
to  queueing.  We  present  a  general  queueing  analysis  of  fault-tolerant  systems  and  study  the  steady-state 
behavior  of  the  number  of  jobs  in  the  system.  As  a  numerical  example,  we  consider  a  system  with  two 
processors  subject  to  failures  and  repairs. 
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l.  Introduction 

Queueing  models  provide  a  useful  tool  for  predicting  the  performance  of  many  service  systems 
including  computer  systems,  telecommunication  systems,  computer/communication  networks  and  flexible 
manufacturing  systems.  Traditional  queueing  models  predict  system  performance  under  the  assumption 
that  all  service  facilities  provide  failure-free  service  [7] .  It  must,  however,  be  acknowledged  that  service 
facilities  do  experience  failures  and  that  they  get  repaired.  Failure/repair  behavior  of  such  systems  is 
commonly  modelled  separately  using  techniques  classified  under  reliability/availability  modeling  [3j.  In 
recent  years,  it  has  been  increasingly  recognized  that  this  separation  of  performance  and 
reliability /availability  models  is  no  longer  adequate  [  13] . 

Two  distinct  approaches  towards  combined  modeling  of  performance  and  reliability /availability 
have  been  used.  In  the  first  approach,  queueing  models  with  server  breakdowns  and  repairs  are  analyzed 
by  means  of  generating  functions  [14],  supplementary  variables  [2],  imbedded  Markov  process  and  renewal 
theory  [6],  or  probabilistic  [17]  techniques.  These  efforts  generally  carry  out  an  exact  steady-state  queue¬ 
ing  analysis  of  the  system  in  the  presence  of  breakdowns  and  repairs.  A  transient  analysis  of  an  M/G/l 
queue  with  server  breakdown,  subject  to  hard-deadline  constraint  on  response  time,  has  been  considered 
recently  |l].  The  second  approach  is  approximate,  in  which  it  is  assumed  that  the  time  to  reach  the 
steady-state  is  much  smaller  than  the  times  to  failures/repairs.  Therefore,  it  is  reasonable  to  associate  a 
performance  measure  (reward)  with  each  state  of  the  underlying  Markov  (or  semi-Markov)  model  describ¬ 
ing  the  failure/repair  behavior  of  the  system.  Each  of  these  performance  measures  is  obtained  from  the 
steady-state  queueing  analysis  of  the  system  in  the  corresponding  state.  The  resulting  reward  model  is 
then  analyzed  for  the  expected  values  or  the  distributions  of  interesting  cumulative  measures  of  system 
performance  [8,9,10,i3j. 

Since  the  job  oriented  view  of  performance/ reliability  in  fault-tolerant  system  is  particularly  impor¬ 
tant.  models  have  been  developed  to  derive  the  distribution  of  job  completion  time  in  a  failure-prom 
environment.  In  these  models,  we  need  to  consider  a  possible  loss  of  work  due  to  the  occurrence  of  a 
failure,  i.e.,  the  interrupted  job  may  be  resumed  or  restarted  upon  service  resumption.  We  have  recently 
considered  models  that  take  into  account  different  types  of  interruptions  in  the  analysis  of  the  job  comple- 
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tion  lime  |8,9,10j.  These  earlier  results  are  the  bases  for  the  queueing  analysis  presented  in  this  paper 
Note  that  the  job  completion  time  analysis  includes  the  delays  due  to  failures  and  repairs,  but  it  does  not 
account  for  queueing  delays.  The  purpose  of  this  paper  is  to  extend  our  earlier  analysis  so  as  to  account 
for  the  queueing  delays.  In  effect,  we  consider  an  exact  queueing  analysis  of  the  system  in  order  to  obtain 
the  steady-state  distribution  and  the  mean  of  the  number  of  jobs  in  the  system. 

We  consider  a  queueing  model  of  a  computer  system  where  the  jobs  arrive  in  a  Poisson  fashion  with 
rate  X.  The  service  requirements  of  the  incoming  jobs  form  a  sequence  of  independent  and  identically  dis¬ 
tributed  random  variables  with  common  cdf  G{  ).  The  computer  system  exists  in  one  of  n  possible 
states.  The  state  of  the  computer  system  changes  with  time  according  to  an  independent  continuous-time 
Markov  chain.  It  is  assumed  that  this  chain  is  irreducible.  When  the  computer  system  is  in  state  i  it 
delivers  service  at  rate  r,-  >  0.  A  state  i  may  be  classified  as  preemptive-resume  (prs  )  or  preemptive- 
repeat-identical  (pri )  as  follows:  A  state  is  said  tc  be  prs  (prt  )  if,  upon  entering  that  state,  the  work 
done  so  far  on  the  current  job  is  preserved  (lost),  and  the  service  is  resumed  (restarted)  in  the  new  state 
Thus  the  actual  time  required  to  complete  a  job  depends  in  a  complex  way  upon  the  service  requirement 
of  the  job  and  the  evolution  of  the  state  of  the  computer  system.  It  is  assumed  that  there  is  an  infinite 
waiting  room  for  the  jobs  and  that  the  service  discipline  is  first  come  first  served  (FCFS).  Note  that 
even  though  the  service  requirements  of  jobs  are  independent  and  identically  distributed,  the  actual  times 
required  to  complete  these  jobs  are  neither  independent  nor  identically  distributed,  and  hence  the  model 
cannot  be  reduced  to  a  standard  M/G/l  queue  [l7j. 

When  all  the  states  of  the  model  describing  the  computer  system  are  prs  ,  i.e.,  no  work  is  ever  lost, 
the  problem  described  here  can  be  analyzed  as  queues  in  random  environments  (e  g.,  see  Purdue  .18')  In 
the  present  paper  we  carry  the  analysis  with  the  possibility  of  work  loss,  i.e.,  when  some  of  the  states  of 
the  underlying  Markov  model  are  prs  and  the  remaining  pri  .  As  loss  of  work  due  to  failures  and  interr¬ 
uptions  is  quite  a  common  phenomenon  in  fault-tolerant  computer  systems,  the  model  proposed  here  is  of 
obvious  interest.  Many  of  the  breakdown-repair  queueing  models  studied  in  the  literature  are  special  cases 
of  the  model  studied  here  (e  g.,  Mitrani  [  14] ,  Nicola  [17],  Baccelli  and  Trivedt  [2]  and  others). 


In  the  next  section  we  first  study  the  situation  with  no  queueing  and  state  some  results,  concerning 
the  analysis  of  job  completion  time,  which  are  direct  extension  of  those  given  in  [9],  Using  these  results 
we  set  up  the  queueing  model  in  Section  3  and  show  that  it  has  the  block  M/G./l  structure  Queueing 
models  with  such  a  structure  have  been  studied  by  Neuts  [15],  Neuts  and  Lucantoni  [16],  Ramaswami  [19] 
and  others.  We  demonstrate  the  usefulness  of  our  approach  by  performing  the  numerical  analysis  for  a 
particular  example.  This  is  done  in  Section  4.  Finally,  conclusions  and  some  extensions  are  discussed  in 
Section  5. 

2  The  Completion  Time  Analysis  of  a  Single  Job 

Consider  a  single  job  with  service  requirement  D  that  starts  getting  served  at  time  0  B  is  a  ran¬ 
dom  variable  with  a  distribution  function  G  (x  )  =  P  (B  <  x)  and  LST  G  (.).  Let  Z(t)  be  the 
state  of  the  computer  system  at  time  t  .  It  is  assumed  that  {Z(t),t  >  0}  is  an  irreducible  continuous¬ 
time  Markov  chain  on  {l,2,...,n  }  with  n  X  n  generator  matrix  Q  =  [<7^  ],  where  qi}  ,  (»  j)  is  the  tran¬ 
sition  rate  from  state  i  to  state  j  ,and  q^  =  -  V}  qtJ  Thus,  each  row  of  Q  sums  to  zero.  Let  T  be 

I 

the  time  when  this  job  completes  its  service.  Define,  for  1  <  i  ,j  <  n  , 

Ftj(t,x)  =  P(T  <  t,Z(T)=j  \Z(0)=i.B=x), 

Fij(s,x)=  E(e-T-Z(T)=j  \  Z(0)=i,D=x)  =  J e'"  dFi}  (l  ,x)  , 

0 

oo 

Fij  '( s,w)=  J  e'™1  F Fj  (s  ,x  )dx. 
o 

Theorems  2.1  and  2.2  below  are  minor  extensions  of  theorems  2  and  3  in  [8;.  They  treat  the  cases  where 
all  the  states  are  prs  or  all  the  states  are  pn  ,  respectively.  The  proofs  are  similar  to  those  in  i8\  and 
hence,  are  omitted. 

Theorem  2.1,  Let  all  the  states  be  of  the  prs  type.  The  double  transforms  Fl}  *(s  ,w  ),  1  <  i  .  i  <  n  . 
are  given  by  the  unique  solution  to 


FtJ  Ks.ti-) 


(2.1) 


r,  *ij 


<I,k 


a  -7,  -r,  U-  t  _ ,  - 7,  -r  r,  it- 

1  *i 


Ft ,  *(  s . «’  )■  1  <  i  .  ]  <  « 


where  7,  =  -7,,  ,  6,.  —  1  if  t  =  7  and  0  otherwise. 


Theorem  2.2.  Let  all  the  states  be  of  the  pn  type.  The  transforms  Fl;  (a  ,x  ),  1  <  j,  j  <  « 

given  by  the  unique  solution  to 


Fij(s,x)=e  6, 


<hk 


*  =  1 
k 


<7, 


lz  r’)Ft;(s,j),  1  <i,j  <  n. 


Next  we  consider  the  mixed  case.  Let  S  G  {l,2,..,n}  be  the  set  of  pri  states, 
5  =  {1.2,  ...ti  }-S  be  the  set  of  prs  states.  Suppose  Z( 0)  G  5  and  let 

V  —  min{f  >0:  Z(l)  £  S). 

Define 

V  =  min{  T  ,U }. 

For  1  G  ?  ,  define 

\[t](t.x)  =  P  (  V  <  t  ,Z(V)=j  I  Z  (0)=«  ,B  —1 ), 
with  the  corresponding  LST  A/l;  (a  ,x  ),  and  the  double  transform 

00 

Afi;  *(«  )  =  f  e~WJ  A/jj  (a  ,x  )</x. 

0 

Note  that 

A/tJ  ( t  .1 )  =  F  (  T  <  t  <  U  ,Z  (T  )—  j  \  Z  (0)= 1  .F  =x  ),  if  j  Gi' 

and 

Aft;  (<  ,J  )  Fir  <  f  <  T  ,Z  (  U)= j  I  Z(0)=«,F=x),  if  j  G  5. 

The  following  theorem  is  an  extension  of  propositions  5.1  and  5.2  in  [9] ;  the  proof  follows  along  the 
lines,  and  hence  is  omitted. 


are 


(2.2) 


and 


same 
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Theorem  2.3.  Let  5  be  the  set  of  pri  states  and  5  be  the  set  of  prs  states  Then 

(i)  The  double  transforms  Ml}  *(s  ,w),  i  ,j  <E  5  ,  satisfy  the  following  equations 
I \  <§,.  _  <7ii 


A/,,  \s  ,w)  = - -  +  2 

s  +9,  +r,  w  s  +q,  +r,  u; 


A/t;  *(s  ,ut ),  i.j  C  > 


(ii)  The  double  transforms  A/-  '(3  ,u>  ),  1  <E*5\  7  <E5,  satisfy  the  following  equation*- 


Mi}  \s  ,w)  = 


+  s 


Qik 


w(5  +  9i  +  r,  u; )  ■s+tf.  +r, 

* 


\fkj  *(s  , w  ),  tC?.j  0' 


.  i 


Equations  (2.3)  and  (2. 4)  have  unique  solutions. 

The  following  theorem  gives  a  method  for  determining  (s,z),  1  <  i,  j  <  n  .  which  i'  tin- 
main  result  of  this  section.  The  proof  of  this  theorem,  being  similar  to  those  of  theorems  5  1  and  5  2  of 
1 9) ,  is  omitted. 


Theorem  2.4  Let  the  states  in  5  be  pri  and  those  in  5  be  prs  .  Then 
(i)  The  LST  s  Ffj  (s  ,x  ),  i£5,  satisfy  the  following  equations 
f'ij  (*  -x  )  =  9%j  («  )  +  E  h>l  (s  >x  )Flj  {s  ,x),  i<=S  ,  \  <  J  <  n. 

I(£S 


where 


(2.5) 


6,j  e 


A*+1,  )z/r,  y-y  ^  c  9i k  [  -(t  +7,  )A/r, 

+  E  L  °tj—  Je 

*€?<€?  »  0 


Mkl  (s  ,z-/j  )<//»  ,  if  r,  >  0 


<7i* 


E  E  6>j ,  ,  i 


Mu  (s,x),  if  r,  =0,  i  £5,1  <  j  <  n 


and 


,* )  = 


7-xh-( '•"•)*  £  — /< 

(«+9, )  igjr,  o 


<7.i 


<7i* 


(*+9.)  iej(s+9,) 


A/y  (s  ,z  -/i  )dh  ,  if  r,  >  0 
Mu  ( 5  ),  ^  r,  =  0,  i  ,1  £5. 


ft 


|n)  The  LST  s  Fjj  (s  , I ),  i  (£S ,  are  given  by 

{s.x)=  £  <5,y  A/t,  (s,i)  +  £  A/„  (s,x)F0  (s,x),  i  ,  1  <  J  <  n.  ,og, 

i^S  i^s 

Thus,  for  the  mixed  case,  the  job  completion  time  is  completely  described  by  the  LST  s  Fs.  (s  ,x  ), 
given  by  theorem  2.4.  These  expressions  are  essential  for  the  queueing  analysis  in  the  next  section. 

S  The  Queuetng  Model 

In  this  section  we  perform  the  steady  state  analysis  of  the  queueing  model  described  in  the  introduc¬ 
tion  Let  A  (t  )  be  the  number  of  jobs  in  the  system  (including  any  in  service)  at  time  t  .  Let  be  the 

time  when  the  t/-th  job  is  completed.  Assume  that  r0  =  0  and  a  newt  job  starts  service  at  time  0. 

Let  A  v  =  A  (f„+)  and  Z v  —  Z  (Tt,+)  be  the  number  of  jobs  and  the  state  of  the  system,  respec¬ 

tively,  immediately  after  the  t/-th  job  completion.  Due  to  the  Poisson  arrivals  and  the  Markov  nature  of 
{Zil  ).t  >  0},  it  is  clear  that  {(X  V,Z  v),i'  >0}  is  a  discrete  time  Markov  chain  with  state  space 
{0.1.  }  X  {1,2.. ...m  }  where  m  —  (  (i  >  0}  j  ,  m  <  n.  (Note  that  a  job  may  complete  in 

state  i  only  if , r,  >  0).  In  this  section  we  study  the  limiting  distribution  of  {(X V,Z „),u  >  0}.  The 
relevance  of  this  limiting  distribution  follows  from  the  fact  that  jobs  arrive  singly  to  the  system  and 
depart  singly  from  the  system,  and  that  the  arrival  process  is  Poisson.  Therefore, 

Lun  P  (X (t  )  —  j  )  =  Lim  P  {X  v=  j  ) 

♦  X  V  — *  oo 

when  the  limits  exist.  This  is  a  well  known  theorem  (see  Cooper  [5]). 

Next  we  determine  the  one  step  transition  probability  matrix  of  {{X „,Z t),v  >  0}. 

S  I.  The  Transition  Probability  Matrix 

We  first  note  that  a  job  may  start  while  the  system  is  in  any  state,  but  it  may  complete  only  if  (he 
system  is  in  a  state  with  a  positive  service  rate.  From  now  on,  we  assume  that  rj  >  0  for  l  <  i  <  m 
and  r,  —  0  for  m  <  i  <  n  .  Assume  that  the  unconditional  LST 

00 

P \j  (s)  =  J F,j  ( s,x)dG(x ),  1  <  1  <  n,  1  <  J  <  m, 
o 
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has  been  computed  using  the  methods  in  Section  2.  F^j{t  )  denotes  the  inverse  of  F [s  ),  t.e., 

FtJ(t  )  =  P(T  <  t  ,Z(T)  =  j  |  Z(0)  -  /).  1  <  I  <  n,  1  <  j  <  m. 

Now  let 

a,j(k)  —  P  (Z  (T  )  =  j  ,  number  oj  arrival s  during  (0,T  <  =  k  |  Z(0)  —  j  j 

so  t 

=  f  e~u  1X t]-  dFtAt)  .  k  =  0.1.2 .  1  <  »  <  n  .  1  <  ,  <  m. 

a  k' 

Define  the  n  X  m  matrix  ,-l  "(k)  =  hi tJ(k  )  ,k  >  0.  Now.  let  )  be  an  exponentially  distributed  ran¬ 
dom  variable  with  parameter  X,  which  is  independent  of  {Z(t  ),/  >  0}.  The  following  quantity  will  be 
needed  in  our  analysis 

dij  =  P(Z(Y)  =  j  |  Z (0)=i ) 

OO 

=  X  /  e'x‘  P{Z{t  )=J  I  Z  (0)=»  )dl  ,  1  <  i.  j  <  n. 
o 

is  the  probability  that  the  system  is  in  state  j  at  the  time  of  the  next  arrival,  given  that  the  system 
was  empty  in  state  1  .  Recognizing  the  integral  as  the  Laplace  transform  we  get  the  following  formula  for 
the  n  X  n  matrix  D  (  =  [d,j  ;) 

D  ■■=  X  X/  Q 

where  Q  is  the  generator  matrix  of  {Z(t  ),t  >  0}  as  defined  in  section  2.  Using  above  notation  we  give 
the  following  theorem. 

Theorem  S.  1. 

PiX^x  —  k  ,Z j  |  -\\,=  k  ,Z„=i  ) 
atJ(k‘  -k  -1).  if  k'  >  k- 1  >  0 

n 

Tj  ‘At  ai>  ( k  ).  if  k  >  k  —  0 

i  =  t 

0,  otherwise  ,  1  <  i ,  j  <  m 


Proof 
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(i)  Let  k  >  £-1  >  0.  Then  Xu+l  =  Xt,  -  1  -r  number  of  arrivals  during  (r^r^,).  The  ser¬ 
vice  of  the  (iH- l)-th  customer  starts  at  time  tl,  with  the  system  in  state  i  and  ends  with  the  system  in 
state  J  .  Hence  the  required  conditional  probability  is  a,j  (k  -k  +1). 

(ii)  Let  k  >  A:  =0.  Thus  the  system  is  empty  when  the  i>-th  job  completes  and  the  system  is  in 
state  i  .  There  follows  an  idle  period  Y  of  exponential  duration  with  parameter  X,  during  which  time  the 
state  of  the  system  changes  to  l  with  probability  dj/ .  The  service  of  the  (t'd-l)-th  job  starts  in  state  / 
and  A'„+1=  number  of  arrivals  during  this  service  time.  Hence  the  required  conditional  probability  is 

"  , 

given  by  d#  aij  ( k  ),  and  is  denoted  by  bjj  (k  1  ),  1  <  i  , ]  <  m  . 
l=  1 

Let  F  (t )  =  [Fij  (t )]  (»  =1,2,... ,n  ;  j  =l,2,...,m  )  be  a  n  X  m  matrix  and  be  a 

m  X  m  submatrix  of  it  obtained  by  taking  its  first  m  rows.  For  k  >  0,  define  the  n  X  m  matrices 
A*  (k)  and  m  X  m  matrices  A  (k  )  as  follows: 

o°  ^ 

A,(k)  =  fe-"&fj-dF(t) 
o  K  ■ 

°o  . 

A(k)  =  fe-*‘^l-dF{rti)(t). 
o  K 

Also  define  m  X  m  matrices  B(k),  k  >  0,  as  follows 
B(k)  —  D  {rtiyA  *{k) 

where  Z?(rerf)  is  the  m  X  n  submatrix  of  D  =  )_1  obtained  by  deleting  its  last  7i  -m  rows. 

Define  the  macro  state  vector  JL  =  {(«  ,  1 ) , ( *  ,2),... ,(i  ,m  )},  i  =0,1,2,....  The  macro  state  j.  means 
that  there  are  «  jobs  in  the  system,  upon  an  arbitrary  job  completion.  Using  the  state  space 
{_L  :  «  >  0},  we  can  write  the  one-step  transition  probability  matrix  of  {(X  WZ  ^,1/  >  0},  as 


0  12  3 


*  •*? A,  • 


’J  ‘j 
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Notice  that  Z „  can  be  in  state  i  if  and  only  if  r,  >  0.  Therefore,  .4  (k  )  and  B(k),k  >  0,  are  all 
m  X  m  matrices,  and  all  elements  of  .4  ( k  )  and  B  (k  ),k  >  0,  are  strictly  positive. 

From  the  above  matrix  representation,  it  is  obvious  that  our  model  has  a  block  M/G/l  structure. 
As  mentioned  before,  there  are  a  large  number  of  models  that  fall  into  this  structure  and  a  general  algo¬ 
rithm  for  the  solution  of  this  problem  has  been  studied  in  detail  by  Lucantoni  and  Neuts  [ll],  Lucantoni 
and  Ramaswami  [12]  and  Neuts  [15].  Here,  we  use  a  modified  version  of  the  method  of  Lucantoni  and 
Neuts  [11]. 

30 

Now  let  A  —  A  (k )  =  F (0).  Note  that  A  (=[a,y  ])  is  an  irreducible  stochastic  matrix. 
k  =0 

Let  £  be  its  invariant  solution,  i.e., 


kt  =  kt  a  ,  ZT  &  =  1. 

where  £  is  an  m  -dimensional  column  vector 

00  d 

Yj  kA  (k  =  [-X— F (r<i)  (s  )  |  ,=oU  •  Note  that  1LT  &  is  the 

*=i  ds 

departure  at  saturation  (i.e.,  assuming  that  the  system  is  never  empty), 
for  the  queueing  system  is  given  by  (Neuts  [15] ): 


of  Is.  We  define 
expected  number  of  arrivals  per 
Hence  the  condition  of  stability 


P  =  KT  L i  <  1, 

which  can  be  rewritten  as 


X  <  X* 

where  X  is  the  threshold  value  of  the  job  arrival  rate  below  which  the  queueing  system  will  remain 
stable.  We  assume  that  the  above  condition  is  satisfied  so  that  the  Markov  chain  {[X V,Z V),v  >  0}  is 
positive  recurrent. 


Now  let 


y(ij) 


LimP(X „=i  ,Z „=>),  i  >  0,  1  <  j  <  m. 


The  infinite  probability  vector  y T  is  written  as 
■U.r  =  ilf(J  ,2),...,y  (i,m  )J.  Define  ^2w'y(i,j) 

i-0 


(jlo  <Ul  2  ••••)>  where 
{]  =  l,2,...,m )  and  let 


£>r  (  w  )  =  (d*^  w  u'  ),...,<?„,  (  w  )).  Then  it  can  be  easily  shown  that 

&T  (w  )  —  nl  [wB  (w  )-A  (w  )jju»/-A  (ui  )J_1  (3.1) 

where 

B(w)=  g  wkB(k)  =  D(red]F'  (\(l-w)), 

k=0 

oo 

a(w)  =  E  wk  A  (k) =  F ("i) 

*  =0 

T 

The  standard  procedure  at  this  point  is  to  determine  j/q  by  complex  function  theory  arguments  based 
upon  the  holomorphic  nature  of  sh^(w),  but  this  procedure  is  numerically  unstable.  Lucantoni  and  Neuts 

T 

[111  have  developed  a  more  stable  procedure  to  obtain  .  We  use  a  modified  version  of  this  procedure 
which  is  described  here  for  completeness. 

Let  V'  be  the  length  of  a  busy  period  initiated  by  a  single  customer  at  time  0.  Define 

Hij  =P{Z(V)=j  |Z(0)=«-},  « ,j  =1,2 . m, 

to  be  the  probability  that  the  system  state  changes  from  «  to  j  during  a  busy  period.  It  is  known  that 

the  matrix  II  =  [II ij  ]  is  the  smallest  solution  to  the  following  nonlinear  equation: 

H  =  g  A  (k  )//* 

k  =0 

oo 

=  JdB(rtj)(f)exp(\(II-I)t).  (3.2) 

o 

Equation  (3.2)  can  be  solved  by  a  straightforward  iterative  method: 

Hq=  [ 

oo 

II n  -v  1  =  / dF )exP(M^n  ~[)t  )  (3  3) 

0 

In  the  limit  as  n  approaches  infinity,  H„  of  equation  (3.3)  approaches  H  ,  the  solution  to  (3.2).  Notice 
that  when  the  matrix  exponential  in  the  above  equation  is  computed  by  an  eigenvalue  technique,  the 
equation  (3.3)  gives  Hn  +  l  in  terms  of  (•)  evaluated  at  the  eigenvalues  of  H„  .  This  method  is  used  in 
the  example  of  next  section,  and  it  obviates  the  need  to  compute  A  ( k  )  for  all  k  .  It  should  be  noted  that 
H  is  a  stochastic  matrix. 
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i 


m 

t'-r 

v. 


r  •  m + •*  '* 

/*  j*v  */  'J*  ’_■* 


It  is  known  that  is  a  solution  to 

oo 

Uo  =uUf:B(k)Hk)  =  Uo  D(rti)  J  dF(t  )exp  (\(H -I)(t ).  (3.4) 

k  =0  0 

oo 

The  above  equation  determines  Hq  upto  a  multiplicative  constant,  since  ^  B(k)Hk  is  a  stochastic 

k  =0 

\ 

matrix,  and  has  rank  m-1.  Again,  the  matrix  B  (/;)//*  can  be  computed  in  terms  of  FtJ  (•) 

k=o 

evaluated  at  the  eigenvalues  of  H  . 

At  this  point,  Lucantoni  and  Neuts  provide  a  rather  formidable  procedure  to  compute  this  multipli¬ 
cative  constant.  Here  we  provide  an  alternative  method,  which  is  based  upon  the  following  equation: 

Lim  &T{w)e  =  1.  (3.5) 

Unfortunately,  wl -A  ( w  )  is  singular  in  the  limit  as  w  — »1,  and  hence  we  need  to  use  L’Hospital’s  rule  to 
compute  the  limit  in  (3.5).  To  do  this,  write, 

j  wl  —  A  (w  )’f 1  =  R(w)/u{w) 

where  R  (w  )  is  the  adjoint  of  wl -A  (u> )  and  u  (w  )  is  the  determinant  of  wl -A  (w  ).  Then  we  get 

«’  (1)  =  W  {(B(\)+B  (1)-A  (\))R  (1)+(B(1)-A  (l))if  (l)k  (3.6) 

where 

^  (1)  =  F(r'd)(0),  B(\)  =  D(red]F~  (0), 

.4  '  (1)  =  )  |  ,=0,  B  (1)  =  ( s  )  | 

Equation  (3.6)  above  provides  the  required  independent  equation  to  determine  the  multiplicative  constant. 
Once  yif  is  known,  (w)  is  completely  determined  and  one  can  compute  moments  by  taking  deriva¬ 
tives. 

One  seeming  difficulty  of  this  procedure  is  the  apparent  necessity  of  having  to  compute  R(w)  and 
u(w)  algebraically.  .As  we  only  need  u(l),u'  ( 1),/?  (1),/?  '  (1),  we  can  use  the  following  theorems 
which  eliminate  the  necessity  of  computing  R  (w)  and  u  (w  )  algebraically. 

Theorem  3.1.  Let  G(w)  =  [<7l;(u>)]  be  a  m  X  m  matrix  of  differentiable  functions.  For 
k  —1,2,..., m  ,  define  m  X  m  matrices  <j(*  '(w  )  as  follows: 
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[G{k\w  )]jj 


G,j{w) 

J-o„M 


if  i  t>^  k 
if  t  =  k 


Then 

J-(detG{w))=  y  detG(k\w). 
dw  t  =  i 

Theorem  3.2.  Let  G(w)  and  G^Xw)  be  as  defined  in  Theorem  3.1.  For  k  =1,2,..., m  ,  define  the 
m  X  m  matrices  G j(  w  )  as  follows: 

I  Gtj  (w  )  if  i  =£  k 

[G(* )(“')!»/  =  \  0  if  *  =  A: 

Then 

-j-\Ad](G{w))\  =  y  {AdjG(k\w)~AdjG (k]{w)\. 

Proofs  of  both  these  theorems  are  straightforward.  These  theorems  provide  0(m4)  method  of  computing 
u'  (1)  and  R'  (1). 

An  interesting  feature  of  this  queueing  model  is  that  the  expected  service  time  of  an  arbitrary  job  in 
steady  state  depends  upon  the  load  offered  to  the  system,  viz.  X.  We  can  easily  derive  expressions  for  this 
quantity  when  X  approaches  0  (a  lightly  loaded  system)  or  as  X  approaches  X*  (a  heavily  loaded  sys¬ 
tem).  Let  Sj,  be  the  service  time  of  the  i/llt  customer.  When  the  arrival  rate  X— >0,  every  incoming  job 
finds  the  system  empty.  Thus,  in  steady  state,  an  incoming  job  finds  the  structure  state  process  in  state  t 
with  probability  0,  =  LimP  (Z (t  )=t  ).  Hence 

t  —  00 

Lim  LunE{Sl/)  =  £  0,  E{  T(x )  |  Z(0)=« ) 

X— 0  v~ co 

where  Oj  =  (<?j,02>  •  .  .  ,  0n  )  is  a  solution  to 

lTQ=o,  iTs.  =  l, 


and 
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E(T(x)\Z(0)  =  ,)  =  -  -f  £  ?i]  --r )  I  »=o- 

as  j  =  1 

When  the  arrival  rate  X— »X*  ,  the  system  is  always  busy.  Hence,  when  a  job  comes  up  for  service, 
the  structure  state  process  is  in  state  i  with  probability,  7r,  ,  where  ‘kJ  =  (jtj,  ....  nm  )  is  the  station¬ 
ary  probability  vector  of  the  matrix  .4  as  defined  before.  Hence 

Lim  LimE(SJ)=  V]  E  (T  (x  )  \  Z  (0)=i )  = 

X-X*  t'-OO  ,=1  X* 

In  the  next  section  we  tabulate  LimE  (5^)  as  a  function  of  X  for  a  two  processor  fault  tolerant  system. 

v—oo 

We  also  study  the  expected  queue  length  in  such  a  system  as  a  function  of  X. 

4.  An  Example 

In  this  section  we  consider  an  example  to  demonstrate  the  use  of  the  techniques  presented  in  section 
3  We  obtain  the  mean  of  the  number  of  jobs  in  a  fault-tolerant  computer  system  in  steady  state.  The 
system  has  two  processor  units  subject  to  failures  and  repairs.  The  failure  rate  of  a  single  processor  is  'y. 
The  failure  of  one  processor  causes  the  preemption  of  the  job  being  processed.  The  interrupted  job  is  res¬ 
tarted  and  processed  at  a  reduced  service  rate  (service  rate  is  assumed  to  be  proportional  to  the  number  of 
operating  processors).  When  both  processors  have  failed,  the  interrupted  job  is  restarted  as  soon  as  one  of 
the  processors  is  repaired  and  is  processed  at  a  reduced  service  rate.  When  the  second  processor  is 
repaired  the  processing  of  the  job  is  continued  at  increased  (normal)  service  rate.  The  failed  processors 
are  repaired  one  at  a  time  with  a  rate  p. 

The  behavior  of  the  system  can  be  described  by  a  continuous-time  Markov  chain  with  the  state- 
transition  diagram  shown  in  figure  1.  Note  that  state  2  corresponds  to  the  system  with  two  operating  pro¬ 
cessors,  and  is  classified  as  a  prs  state.  The  service  rate  in  state  2  is  r2(=2).  State  1  corresponds  to 
the  system  with  one  operating  processor,  and  is  classified  as  a  pri  state.  The  service  rate  in  state  1  is 
r  |(  =  1 ).  State  3  corresponds  to  the  system  with  both  processors  failed,  and  is  classified  as  a  pr  1  state. 
The  service  rate  in  state  3  is  r3(=0).  Jobs  arrive  into  the  system  according  to  a  Poisson  process  at  a  rate 
X.  Each  job  has  a  deterministic  work  requirement  ,  say  x  units  of  work. 
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We  now  follow  the  procedure  suggested  in  section  2  in  order  to  compute 
FtJ  ( s  ,x  ),  j  =1,2,3;  j  =1,2.  (Note  that  a  job  may  complete  only  in  a  state  with  nonzero  service  rate). 
In  our  example  there  are  two  types  of  system  states,  the  prs  subset  5  =  {2},  and  the  pn  subset 
5  =  {1,3}.  From  theorem  2.3,  equations  (2.3}  and  (2.4)  yield 


M»o  *(s  ,w  )  =  - - - 

s  +2-j+2w 

and 


(4.1) 


M 2i  ’(s  ,w)  =  — - - ?2 - . 

w(s  +2~t+2w ) 

Inverting  with  respect  to  w  ,  we  get 
A/2o(s,x)  = 

and 


(4.2) 


(4.3) 


A/21(5,x)  =  (_^)(l-e-(*^)I/2). 

s  +27 


(4.4) 


The  LST  s  Fi}  (3  ,x),  i  — l,3;j  — 1,2,  can  be  determined  from  theorem  2.4,  equation  (2.5),  as  follows: 


F3l  (Six)  =  (-B-)Fu 

8  T 


(4.5) 


^32  (s  ,x)  =  ( — 7 — )F  12  (a  ,x ) 

S  -1-/i 


(4.6) 


where 


^11  (s  -*) 


e  -i*  +i+ii)z 

D(s  ,x) 


(4.7) 


and 


F 12  («  .■r) 


( — rif _ \  (e-(»+2l)i/2_  ^ 

{s+2p)[  e  ' 


D(s  ,x) 


(4.8) 


with 
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D  (s  ,i )  =  ,1 


1/i( 3s  — 2~y-»-2;i )  ( 1-e 


-(»  - i»^|j 


)  -4")/*  ( e 


(5  +/i)(a  +2qr)(s  +-7-^/1) 


,  -(*  f  2"»)i  -■  2_g  -(»  -'-7+*iU 

(5  +2qr)(s  +2/i ) 


The  LST  s  FtJ  (s  ,1 ),  1  =2,  j  =1,2,  follow  from  theorem  2,1,  equation  (2.6),  as  follows 

F21  (s  ,x)  =  A/o,  (s  ,j)Fn  (s  ,*),  (4  9) 

F 00  (s  ,j  )  =  A/ 22  (s  ,2  )  +  A/21  (s  )F  l2  (s  ,z  )  (410) 

with  A/22  (s  ,x  )  and  A/21  (s  ,1  )  as  given  by  equations  (4.3)  and  (4.4),  respectively. 

Now  that  we  have  evaluated  the  LST  s  F,;  (s  ,jr ),  1  =  1,2,3;  j  =1,2,  we  can  carry  out  the  queue¬ 
ing  analysis  as  suggested  in  section  3.  The  matrix  A  is  given  by 


A  —  Yj  A(k) 

k=  0 


-i*;(o.*)i,  *',>= 1,2 

can  now  be  evaluated  using  equations  (4.7)-(4.lO).  It  follows  that 

e-*a  (l-e'"1) 

A  =  ( e  '***  -e  )  (i_e^*+c-^+^) 

7* 

As  noted  earlier,  A  is  an  irreducible  stochastic  matrix.  Let  £  be  the  solution  to  £ 
KT  S.  —  l,  lben 


=  £r  A  and 


710  -  ,  • 

(l-e^^) 

T 

The  condition  of  stabilty  for  the  present  queueing  system  is  given  by  £  £  <  1,  with 


di  =  S*!«.i(t)  +  «,2(^)i 

* =1 

=  ~KFn  (0,z)  +  F<2'  (0,x  )J,  1=1,2. 


Thus,  we  have,  after  evaluating  F,;  (0,z ),  1  ,j  =1,2,  using  equations  (4.7)-(4.10), 


—  '  l\ 


27/j(7-r/i) 


27 


and 


0o  =  X[(i2±£l!±r)  (e^+e-(^)*_e-^ (e-<1+'i|,-e''4I)|. 
27p(7+/j)  °- 


27 


The  condition  of  stability  follows  <  1 ), 


X  < 


=  X*. 


((7+M)2+73)(e',*-D 

Now  we  proceed  to  determine  the  mean  of  the  number  of  jobs  in  the  system,  in  steady  state. 
Let  the  irreducible  stochastic  matrix  H  (defined  in  section  3)  be  given  by 


6  1-6 
1-0  9 


H  = 

It  is  determined  as  the  smallest  solution  to 


11  =  J  dF  (t  ,x)e 
o 

with  dF (t  ,x )  =  [dF,j(t  ,x  )j,  « , j  =1,2 

It  is  easy  to  show  that 


,  MM -1)1  _  J_ 


a 


l-04-(l-6)e ~a<  l-6+(6-l)e  ~al 
l-0+(0-l)e~al  l-6-h(l-0)e~al 


where  a  =  X(2-6-0).  Hence  substituting  (4.17)  in  (4.16),  we  get: 


//  =  — 

Q 


(1-0)  f(l-6)F | ]  (a,x  )->-(0—l)F  jo  (a,x  )  (l-0)-t-(6-l)Fn  (a,x  )+(l-0)F12  ( a,x  ' 
( l-0)-r-(l-6)F2I  (a, i  )-r-(0-l)F22  (a,x  )  (l-6)+(6-l)F2]  (a,x  )+(l-0)F22  (a.x 


The  unknowns  6  and  0  can  be  determined  by  solving  the  following  two  nonlinear  equations 

6a  =  {l-0)+(\-6)F ,,  (a,x  )+(0-l)F12  (a, x  ), 


0a  =  (l-<5)+(l-0)F 22  (a,x)+(6-l)F21  (a,x) 


with  FtJ  (a,x),  i  ,j  =1,2,  from  equations  (4.7)-(4. 10). 


Equations  (4. 18)  and  (4.19)  are  solved  using  Broyden's  method  1  which  converges  quickly  for  judi¬ 
ciously  chosen  initial  solution  vectors.  Then  the  vector  i£q  is  solved  for  by  using  (3.4)  and  (3  6)  l  sing 
we  then  compute  (w  )  from  (3.1).  By  the  method  described  in  section  3,  we  are  then  able  to  com¬ 
pute  the  expected  number  of  jobs  in  the  system  in  the  steady  state.  We  plot  the  expected  number  of  jobs 
as  a  function  of  X  in  figure  2  for  J=0.01,/i=l  for  three  different  values  of  the  failure  rate 
'7=0.01,0  05,0.1.  As  expected,  increasing  the  failure  rate  -7  implies  a  substantia)  increase  in  system 
congestion 


In  the  following  table,  we  give  the  expected  service  time  in  the  steady  state  as  a  function  of  X  X* 
for  ~f  =0.1,  p  =1  and  x  =0.01.  In  this  case  the  threshold  arrival  rate  X  *  is  180.24  The  expected  service 
time  of  an  arbitrary  job  in  steady  state  is  denoted  by  E(S). 

a/.-*  e(s)*io3 


0.99 

5.55 

0.90 

5  60 

0.80 

5.68 

0.70 

5.77 

0‘.  GO 

5.88 

0  50 

5.98 

0.40 

6  05 

0  30 

6  14 

0.20 

6  30 

0  10 

6  71 

0.00 

10.9 

It  is  seen  that  the  evpected  sei  »  time  reduces  from  0.0199  to  0.0055  as  X  incereases  from  0  to 
0  99»  X*  This  seemingly  non-intuitive  result  appears  because  as  X  increases,  the  probability  that  a  job 
will  be  taken  up  for  service  when  both  processors  are  down  decreases. 
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